首页> 外文OA文献 >Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters
【2h】

Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters

机译:利用共享缓存进行模板代码的并行时间阻塞   在多核处理器和集群上

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Bandwidth-starved multicore chips have become ubiquitous. It is well knownthat the performance of stencil codes can be improved by temporal blocking,lessening the pressure on the memory interface. We introduce a new pipelinedapproach that makes explicit use of shared caches in multicore environments andminimizes synchronization and boundary overhead. Benchmark results arepresented for three current x86-based microprocessors, showing clearly that ouroptimization works best on designs with high-speed shared caches and low memorybandwidth per core. We furthermore demonstrate that simple bandwidth-basedperformance models are inaccurate for this kind of algorithm and employ a moreelaborate, synthetic modeling procedure. Finally we show that temporal blockingcan be employed successfully in a hybrid shared/distributed-memory environment,albeit with limited benefit at strong scaling.
机译:带宽匮乏的多核芯片已经无处不在。众所周知,模板代码的性能可以通过临时阻塞来提高,从而减轻了存储接口上的压力。我们引入了一种新的管道化方法,该方法在多核环境中显式使用共享缓存,并最大程度地减少了同步和边界开销。给出了三个当前基于x86的微处理器的基准测试结果,清楚地表明,我们的优化在具有高速共享缓存和每个内核低内存带宽的设计上效果最佳。我们进一步证明,简单的基于带宽的性能模型对于这种算法是不准确的,并采用了更为复杂的综合建模程序。最后,我们证明了时间阻塞可以成功地在混合共享/分布式内存环境中使用,尽管在强扩展方面收益有限。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号